Synchronization of multiple real-time transport protocol sessions

ABSTRACT

A packet generator for generating packets from an input signal configured to: generate at least one first signal, dependent on the input signal, the first signal comprising a first relative time value; generate at least one second signal, dependent on the input signal and associated with the at least one first signal; and generate at least one indicator associated with each of the at least one second signal, each indicator dependent on the first relative time value.

RELATED APPLICATION

This application was originally filed as PCT Application No.PCT/EP2007/063202 filed Dec. 3, 2007.

FIELD OF THE INVENTION

The present invention relates to media transport, and in particular, butnot exclusively to transport of encoded speech, audio or video signal.

BACKGROUND OF THE INVENTION

Audio signals, like speech or music, are encoded for example forenabling an efficient transmission or storage of the audio signals.

Audio encoders and decoders are used to represent audio based signals,such as music and background noise. These types of coders typically donot utilise a speech model for the coding process, rather they useprocesses for representing all types of audio signals, including speech.

Speech encoders and decoders (codecs) are usually optimised for speechsignals, and can operate at either a fixed or variable bit rate.

An audio codec can also be configured to operate with varying bit rates.At lower bit rates, such an audio codec may work with speech signals ata coding rate equivalent to a pure speech codec. At higher bit rates,the audio codec may code any signal including music, background noiseand speech, with higher quality and performance.

In some audio codecs the input signal is divided into a limited numberof bands. Each of the band signals may be quantized. From the theory ofpsychoacoustics it is known that the highest frequencies in the spectrumare perceptually less important than the low frequencies. This in someaudio codecs is reflected by a bit allocation where fewer bits areallocated to high frequency signals than low frequency signals.

One emerging trend in the field of media coding are so-called layeredcodecs, for example ITU-T Embedded Variable Bit-Rate (EV-VBR)speech/audio codec and ITU-T Scalable Video Codec (SVC). The scalablemedia data consists of a core layer, which is always needed to enablereconstruction in the receiving end, and one or several enhancementlayers that can be used to provide added value to the reconstructedmedia (e.g. improved media quality or increased robustness againsttransmission errors, etc).

The scalability of these codecs may be used in a transmission level e.g.for controlling the network capacity or shaping a multicast media streamto facilitate operation with participants behind access links ofdifferent bandwidth. In an application level the scalability may be usedfor controlling such variables as computational complexity, encodingdelay, or desired quality level. Note that whilst in some scenarios thescalability can be applied at the transmitting end point, there are alsooperating scenarios where it is more suitable that an intermediatenetwork element is able to perform the scaling.

For example this scalable layer operation to audio encoding may beemployed in telephony. For example in packet switched networktransmission protocols typically employed for Voice over IP (VoIP) theaudio signal is layer encoded using packets transmitted according to theReal-time Transport Protocol (RTP) encapsulated in the User DatagramProtocol (UDP), further encapsulated in Internet Protocol (IP).

In such media transport arrangements scalable codecs can be handled inone of two ways. In the first arrangement the enhancement layers may betransmitted in the same packets, i.e. in the same RTP session as thecore layer data.

The approach of carrying all of the layers (of a media frame) in asingle packet provides low overhead and easy cross-layer synchronizationas the receiver decoder knows that all information for a certain mediaframe is carried in the same packet, which implicitly also providescross-layer media synchronization. However, the drawback of thisapproach is that any intermediate network element carrying out a scalingoperation needs to be aware of the details of the packet and mediacontent structure, and then carry out a filtering operation by reading,parsing, and then modifying the packet contents.

The second approach is that the enhancement layers (or the subsets ofenhancement layers) may be transmitted in separate packet stream(s) asthe core layer data. This second approach requires also a signallingmechanism that can be used to synchronize the separate packet datastreams carrying layers of the same media source.

However the second approach, employing separate data streams for(subsets of) layers, provides easier scaling opportunities because thescaling operations can be realized by discarding packets of some datastreams and therefore not requiring the packet to be modified.

This approach therefore does not require in-depth knowledge about thepacket structure but the scaling operations can be performed based oninformation about the relationship between the data streams.

Multiple data streams using multiple RTP sessions is the traditional wayto transmit layered media data within a RTP framework (the approach isoften referred to as scalable multicast).

Synchronization of multiple data streams is obviously a problem when thereceiver is reconstructing a media frame using layers distributed acrossmultiple RTP sessions.

The timeline of a media stream received in RTP packets can bereconstructed using Time Stamp (TS) information included in the RTPheader. The RTP TS provides information on the temporal differencecompared to other RTP packets transmitted in the same RTP session, whichenables putting each received media frame in its correct place in thetimeline.

However, the initial value of the RTP TS of an RTP session is a randomvalue. Thus the RTP TS does NOT indicate an absolute time (i.e.“wallclock time”) but only a relative time or timing reference withinthe RTP session. Note that this “randomness” may be considered to be anunknown offset from an absolute time. The unknown offset may and islikely to be different in each RTP session.

Thus two or more RTP sessions cannot be synchronized based on their RTPTS values only and is valid for separate RTP sessions used to carry(subsets of) layers of layered encoding.

The prior-art mechanism for synchronizing multiple RTP sessions is touse the control protocol associated with the transport protocol to sendadditional information. Thus in the prior art Real-Time Control Protocol(RTCP) reports may be transmitted within each session. The transmitterof these RTCP reports includes both a timing reference (NTP) and asending instant in the RTP TS domain in the RTCP Sender Reports (SR)transmitted according to a specified pattern. Furthermore, an RTCPpacket also includes an identifier (SSRC) that is used to map an RTCPpacket to the correct stream of RTP packets.

The receiver on receiving the control protocol packets may use thesecontrol protocol packets within the timing reference and the RTP timestamps to compute the RTP TS offset from the timing reference (NTP) foreach of the RTP sessions it receives. These offsets values may then beused to match the timing of the media data received in separate RTPsessions. Therefore, for example, the receiver may combine layers of amedia frame received in multiple RTP sessions.

However a problem associated with such a system is that it requires aRTCP SRs for each of the RTP sessions to be received before any fullreconstruction of a media frame can be carried out. In case of layeredmedia in practice this means that only the core layer of any layeredencoding scheme is available until the synchronization information isavailable, i.e. until the first RTCP packets (on each of the sessions)are received.

A less complex approach, not relying on the control protocol associatedwith the transport protocol, has been to pre-synchronize the RTP TSacross RTP sessions in the transmitting end-point. In other words the“random” initial value of the RTP TS is set to be the same value foreach RTP session.

Whilst this may provide simple cross-session synchronization mechanismwithout need to transmit additional data, it is not in line with the RTPspecification, and existing RTP implementations may therefore notsupport it.

Furthermore such an approach would provide synchronization at the RTP(header) level, but only for the subset of RTP payloads which werepre-synchronized. Such payload type-dependent processing at RTP levelmay be considered a non-desirable feature in a system handling multiplepayload types.

A further prior-art solution has been to attach additional informationfor each transmitted data unit (e.g. a layer of a media frame in alayered encoding transport) indicating its temporal location in thepresentation timeline. Such additional information may be for example across-layer sequence number or an additional timestamp information valuethat can be used at the receiver/intermediate network element toreconstruct the presentation order of media frames and layers within theframes.

This additional information adds additional overhead for each packet andalthough it may be possible to use information with data fields smallerthan a RTP Sequence Number (SN) and RTP TS (16 bits and 32 bits,respectively), any information added to each packet would stillintroduce additional overhead for each transmitted layer. For example inthe case of smallish pieces of speech/audio data (in order of 10-20bytes) even one additional byte of overhead per layer may have asignificant effect on the overall system performance.

SUMMARY OF THE INVENTION

This application proposes a mechanism that facilitates efficientcross-layer synchronization with minimal overhead where multiple RTPsessions are used.

Embodiments of the present invention aim to address the above problem.

There is provided according to a first aspect of the present invention apacket generator for generating packets from an input signal configuredto: generate at least one first signal, dependent on the input signal,the first signal comprising a first relative time value; generate atleast one second signal, dependent on the input signal and associatedwith the at least one first signal; generate at least one indicatorassociated with each of the at least one second signal, each indicatordependent on the first relative time value.

The first signal may comprise a core layer encoded signal.

Each of the at least one second signal may comprise an enhancementlayer.

The indicator may comprise an identifier identifying the correspondingfirst signal.

The first relative time value may comprise a real-time transportprotocol time stamp and wherein the identifier may comprise thereal-time transport protocol time stamp of the first encoded signal.

Each second signal may comprise an associated second relative timevalue.

The indicator may comprise the difference between the first relativetime value and the associated second relative time value.

The first signal may further comprise a sequence number; and wherein theeach indicator may comprise the sequence number value.

The first signal may further comprise a first identifier value dependenton the first relative time value.

Each indicator may comprise the first identifier value.

Each indicator may comprise at least one second identifier value, thesecond identifier value being preferably dependent on the firstidentifier value and the at least one second signal.

The packet generator for generating packets from the input signal mayfurther be configured to determine if a synchronization point has beenreached.

The packet generator is preferably configured to generate the indicatoronly if a synchronization point has been reached.

The synchronization point is preferably determined dependent onreceiving a synchronization request.

The at least a first signal may comprise a plurality of packets andwherein the synchronization point is preferably determined dependent ondetecting any of a predetermined number of packets, wherein thepredetermined number of packets may comprise at least one of: the firstpredetermined number of packets of the plurality of packets; at leastone of the predetermined number of packets separated from a second ofthe predetermined number of packets by a second predetermined number ofpackets; at least one of the predetermined number of packets separatedfrom a second of the predetermined number of packets by a predeterminedtime period.

The at least a first signal may comprise a plurality of packets andwherein the synchronization point is preferably determined dependent ondetecting a discontinuity in the stream of packets.

The at least a first signal may comprise a plurality of packets andwherein the synchronization point is preferably determined dependent ondetecting the changing of the configuration of at least one of the firstand second signals.

The at least a first signal may comprise a plurality of packets andwherein the synchronization point is preferably determined dependent ondetecting a talk spurt or a restarting of a session.

The first signal may comprise at least one of: an algebraic code excitedlinear prediction signal; a variable rate multi-rate encoded signal; andITU-T G729 core layer signal.

The packet generator is preferably further configured to encode theinput stream to generate the first signal and the second signal.

The input stream preferably comprises at least one of: audio data; videodata; MIDI data; animated graphics data; and text data.

According to a second aspect of the invention there is provided a packetsynchronizer for synchronizing an encoded signal, configured to: receivean encoded signal comprising at least one first encoded signal,comprising a first relative time value, at least one second encodedsignal comprising a second relative time value, and at least oneindicator associated with each of the at least one second encodedsignal; synchronize each of the at least one second encoded signal withat least one of at least one first encoded signal dependent on theassociated at least one indicator first relative time value.

The packet synchronizer for synchronizing an encoded signal may furtherbe configured to decode each of the at least one second encoded signalsynchronized with the at least one of the first encoded signals togenerate a decoded signal.

The at least one first encoded signal may comprise at least one corelayer audio encoded signal, and the at least one second encoded signalmay comprise at least one enhancement layer audio encoded signal.

The indicator may comprise an identifier identifying one of the at leastone first encoded signal.

Each first relative time value may comprise a first time stamp value,and the indicator may comprise the first time stamp value for one of thefirst encoded signals associated with one of the second encoded signals.

Each second relative time value may comprise a second time stamp valueand the decoder is preferably further configured to determine a timeoffset value between the first time stamp value and the second timestamp value of the associated one of the second encoded signals.

The indicator may comprise the difference between the first relativetime value and the associated second relative time value.

The first encoded signal may further comprise a sequence number; andwherein the each indicator may comprise the sequence number value of theassociated first encoded signal.

The first encoded signal may further comprise a first identifier valuedependent on the first relative time value.

Each indicator may comprise the first identifier value.

Each indicator may comprise at least one second identifier value, thesecond identifier value may be dependent on the first identifier valueand the at least one second encoded signal.

According to a third aspect of the invention there is provided a methodfor generating packets from an input signal comprising: generating atleast one first signal, dependent on the input signal, the first signalcomprising a first relative time value; generating at least one secondsignal, dependent on the input signal and associated with the at leastone first signal; generating at least one indicator associated with eachof the at least one second signal, each indicator dependent on the firstrelative time value.

The first signal may comprise a core layer encoded signal.

Each of the at least one second signal may comprise an enhancementlayer.

The indicator may comprise an identifier identifying the correspondingfirst signal.

The first relative time value may comprise a real-time transportprotocol time stamp and wherein the identifier comprises the real-timetransport protocol time stamp of the first encoded signal.

Each second signal may comprise an associated second relative timevalue.

The indicator may comprise the difference between the first relativetime value and the associated second relative time value.

The first signal may further comprise a sequence number; and wherein theeach indicator comprises the sequence number value.

The first signal may further comprise a first identifier value dependenton the first relative time value.

Each indicator may comprise the first identifier value.

Each indicator may comprise at least one second identifier value, thesecond identifier value may be dependent on the first identifier valueand the at least one second signal.

The method for generating packets from the input signal may furthercomprise: determining if a synchronization point has been reached.

The method for generating packets from the input signal may furthercomprise generating the indicator only if a synchronization point hasbeen reached.

The synchronization point is preferably determined dependent onreceiving a synchronization request.

The at least a first signal preferably comprises a plurality of packetsand wherein the synchronization point is preferably determined dependenton detecting any of a predetermined number of packets, wherein thepredetermined number of packets may comprise at least one of: the firstpredetermined number of packets of the plurality of packets; at leastone of the predetermined number of packets separated from a second ofthe predetermined number of packets by a second predetermined number ofpackets; at least one of the predetermined number of packets separatedfrom a second of the predetermined number of packets by a predeterminedtime period.

The at least a first signal may comprise a plurality of packets andwherein the synchronization point is preferably determined dependent ondetecting a discontinuity in the stream of packets.

The at least a first signal may comprise a plurality of packets andwherein the synchronization point is preferably determined dependent ondetecting the changing of the configuration of at least one of the firstand second signals.

The at least a first signal may comprise a plurality of packets andwherein the synchronization point is preferably determined dependent ondetecting a talk spurt or a restarting of a session.

The first signal may comprise at least one of: an algebraic code excitedlinear prediction signal; a variable rate multi-rate encoded signal; andITU-T G729 core layer signal.

The method may further comprise encoding the input stream to generatethe first signal and the second signal.

The input stream may further comprise at least one of: audio data; videodata; MIDI data; animated graphics data; and text data.

According to a fourth aspect of the present invention there is provideda method for synchronizing an encoded signal, comprising: receiving anencoded signal comprising at least one first encoded signal, comprisinga first relative time value, at least one second encoded signalcomprising a second relative time value, and at least one indicatorassociated with each of the at least one second encoded signal; andsynchronizing each of the at least one second encoded signal with atleast one of at least one first encoded signal dependent on theassociated at least one indicator first relative time value.

The method for synchronizing an encoded signal may further comprisedecoding each of the at least one second encoded signal synchronizedwith the at least one of the first encoded signals to generate a decodedsignal.

The at least one first encoded signal may comprises at least one corelayer encoded signal, and the at least one second encoded signal maycomprise at least one enhancement layer encoded signal.

The indicator may comprise an identifier identifying one of the at leastone first encoded signal.

Each first relative time value may comprise a first time stamp value,and the indicator may comprise the first time stamp value for one of thefirst encoded signals associated with one of the second encoded signals.

Each second relative time value may comprise a second time stamp valueand the method may further comprises determining a time offset valuebetween the first time stamp value and the second time stamp value ofthe associated one of the second encoded signals.

The indicator may comprise the difference between the first relativetime value and the associated second relative time value.

The first encoded signal may further comprise a sequence number; andwherein the each indicator comprises the sequence number value of theassociated first encoded signal.

The first encoded signal may further comprise a first identifier valuedependent on the first relative time value.

Each indicator may comprise the first identifier value.

Each indicator may comprise at least one second identifier value, thesecond identifier value may be dependent on the first identifier valueand the at least one second encoded signal.

An apparatus may comprise an encoder as featured above.

An apparatus may comprise a decoder as featured above.

An electronic device may comprise an encoder as featured above.

An electronic device may comprise a decoder as featured above.

A chipset may comprise an encoder as featured above.

A chipset may comprise a decoder as featured above.

According to a fifth aspect of the present invention there is provided acomputer program product configured to perform a method for generatingpackets from an input signal comprising: generating at least one firstsignal, dependent on the input signal, the first signal comprising afirst relative time value; generating at least one second signal,dependent on the input signal and associated with the at least one firstsignal; generating at least one indicator associated with each of the atleast one second signal, each indicator dependent on the first relativetime value.

According to a sixth aspect of the present invention there is provided acomputer program product configured to perform a method forsynchronizing an encoded signal, comprising: receiving an encoded signalcomprising at least one first encoded signal, comprising a firstrelative time value, at least one second encoded signal comprising asecond relative time value, and at least one indicator associated witheach of the at least one second encoded signal; and synchronizing eachof the at least one second encoded signal with at least one of at leastone first encoded signal dependent on the associated at least oneindicator first relative time value.

According to a seventh aspect of the present invention there is provideda packet generator for generating packets from an input signalcomprising: first processing means for generating at least one firstsignal, dependent on the input signal, the first signal comprising afirst relative time value; second processing means for generating atleast one second signal, dependent on the input signal and associatedwith the at least one first signal; and third processing means forgenerating at least one indicator associated with each of the at leastone second signal, each indicator dependent on the first relative timevalue.

According to an eighth aspect of the present invention there is provideda packet synchronizer for synchronizing packets comprising: receivingmeans for receiving an encoded signal comprising at least one firstencoded signal, comprising a first relative time value, at least onesecond encoded signal comprising a second relative time value, and atleast one indicator associated with each of the at least one secondencoded signal; signal processing means for synchronizing each of the atleast one second encoded signal with at least one of at least one firstencoded signal dependent on the associated at least one indicator firstrelative time value.

BRIEF DESCRIPTION OF DRAWINGS

For better understanding of the present invention, reference will now bemade by way of example to the accompanying drawings in which:

FIG. 1 shows schematically an electronic device employing embodiments ofthe invention;

FIG. 2 shows schematically an audio codec system employing embodimentsof the present invention;

FIG. 3 shows schematically an encoder part of the audio codec systemshown in FIG. 2;

FIG. 4 shows schematically a flow diagram illustrating the operation ofan embodiment of the audio encoder as shown in FIG. 3 according to thepresent invention;

FIG. 5 shows schematically a transport block as used in an embodiment ofthe invention;

FIG. 6 shows a schematically a decoder part of the audio codec systemshown in FIG. 2; and

FIG. 7 shows a flow diagram illustrating the operation of an embodimentof the audio decoder as shown in FIG. 6 according to the presentinvention.

DESCRIPTION OF PREFERRED EMBODIMENTS OF THE INVENTION

Whilst the below examples and embodiments are described with respect tothe audio encoding and decoding using RTP sessions it would be possiblefor the person skilled in the art to apply the same or similaroperations to synchronize RTP sessions carrying media streams with othertypes of relationships than layered coding. For example the sameapproach may be applied to a set of media streams constituting the datafor multi descriptive coding. Furthermore, the mechanism described belowmay also apply to any different types of media components transmitted inseparate RTP sessions, for example audio and video coding.

The following describes in more detail possible mechanisms for theprovision of a scalable audio coding system. In this regard reference isfirst made to FIG. 1 which shows a schematic block diagram of anexemplary electronic device 10, which may incorporate a codec accordingto an embodiment of the invention.

The electronic device 10 may for example be a mobile terminal or userequipment of a wireless communication system.

The electronic device 10 comprises a microphone 11, which is linked viaan analogue-to-digital converter 14 to a processor 21. The processor 21is further linked via a digital-to-analogue converter 32 to loudspeakers33. The processor 21 is further linked to a transceiver (TX/RX) 13, to auser interface (UI) 15 and to a memory 22.

The processor 21 may be configured to execute various program codes. Theimplemented program codes comprise an audio encoding code for encoding acombined audio signal and code to extract and encode side informationpertaining to the spatial information of the multiple channels. Theimplemented program codes 23 further comprise an audio decoding code.The implemented program codes 23 may be stored for example in the memory22 for retrieval by the processor 21 whenever needed. The memory 22could further provide a section 24 for storing data, for example datathat has been encoded in accordance with the invention.

The encoding and decoding code may in embodiments of the invention beimplemented in hardware or firmware.

The user interface 15 enables a user to input commands to the electronicdevice 10, for example via a keypad, and/or to obtain information fromthe electronic device 10, for example via a display. The transceiver 13enables a communication with other electronic devices, for example via awireless communication network.

It is to be understood again that the structure of the electronic device10 could be supplemented and varied in many ways.

A user of the electronic device 10 may use the microphone 11 forinputting speech that is to be transmitted to some other electronicdevice or that is to be stored in the data section 24 of the memory 22.A corresponding application has been activated to this end by the uservia the user interface 15. This application, which may be run by theprocessor 21, causes the processor 21 to execute the encoding codestored in the memory 22.

The analogue-to-digital converter 14 converts the input analogue audiosignal into a digital audio signal and provides the digital audio signalto the processor 21.

The processor 21 may then process the digital audio signal in the sameway as described with reference to FIGS. 2 and 3.

The resulting bit stream is provided to the transceiver 13 fortransmission to another electronic device. Alternatively, the coded datacould be stored in the data section 24 of the memory 22, for instancefor a later transmission or for a later presentation by the sameelectronic device 10.

The electronic device 10 could also receive a bit stream withcorrespondingly encoded data from another electronic device via itstransceiver 13. In this case, the processor 21 may execute the decodingprogram code stored in the memory 22. The processor 21 decodes thereceived data, and provides the decoded data to the digital-to-analogueconverter 32. The digital-to-analogue converter 32 converts the digitaldecoded data into analogue audio data and outputs them via theloudspeakers 33. Execution of the decoding program code could betriggered as well by an application that has been called by the user viathe user interface 15.

The received encoded data could also be stored instead of an immediatepresentation via the loudspeaker(s) 33 in the data section 24 of thememory 22, for instance for enabling a later presentation or aforwarding to still another electronic device.

It would be appreciated that the schematic structures described in FIGS.2, 3, 5 and 6 and the method steps in FIGS. 4 and 7 represent only apart of the operation of a complete audio codec as exemplarily shownimplemented in the electronic device shown in FIG. 1.

The general operation of audio codecs as employed by embodiments of theinvention is shown in FIG. 2. General audio coding/decoding systemsconsist of an encoder and a decoder, as illustrated schematically inFIG. 2. Illustrated is a system 102 with an encoder 104, a storage ormedia channel 106 and a decoder 108.

The encoder 104 compresses an input audio signal 110 producing a bitstream 112, which is either stored or transmitted through a mediachannel 106. The bit stream 112 can be received within the decoder 108.The decoder 108 decompresses the bit stream 112 and produces an outputaudio signal 114. The bit rate of the bit stream 112 and the quality ofthe output audio signal 114 in relation to the input signal 110 are themain features, which define the performance of the coding system 102.

FIG. 3 depicts schematically an encoder 104 according to an exemplaryembodiment of the invention. The encoder 104 comprises an input 203which is arranged to receive an audio signal.

The audio signal input 203 is connected to a layer codec processor 205.The output from the layer codec processor 205 is passed to the corelayer RTP session generator 209 and an input of an enhancement layer RTPsession generator 211.

The core layer RTP session generator 209 is configured to have oneoutput connected to an input of a network interface 213 and a furtheroutput connected to a further input of the enhancement layer RTP sessiongenerator 211. The enhancement layer RTP session generator 211 isconfigured to have an output connected to a further input of the networkinterface 213.

The network interface 213 may then be arranged to output the output theRTP packet streams 112 via the output 206.

The operation of these components is described in more detail withreference to the flow chart FIG. 4 showing the operation of the encoder104.

The audio signal is received by the encoder 104. In a first embodimentof the invention the audio signal is a digitally sampled signal. Inother embodiments of the present invention the audio input may be ananalogue audio signal, for example from a microphone 6, which isanalogue to digitally (A/D) converted. In further embodiments of theinvention the audio input is converted from a pulse code modulationdigital signal to amplitude modulation digital signal. The receiving ofthe audio signal is shown in FIG. 4 by step 301.

The layer codec processor 205 receives the audio signal to be encodedand outputs the encoded parameters which represent the core levelencoded signal. The layer codec processor 205 furthermore generates forinternal use the synthesised audio signal (in other words the audiosignal is encoded into parameters and then the parameters are decodedusing the reciprocal process to produce the synthesised audio signal).

The layer codec processor 205 may use any appropriate encodingtechnique, to generate the core layer.

In a first embodiment of the invention the layer codec processor 205generates a core layer using an embedded variable bit rate codec(EV-VBR).

In other embodiments of the invention the core layer may be an algebraiccode excited linear prediction encoding (ACELP) and is configured tooutput a bitstream of typical ACELP parameters.

It is to be understood that embodiments of the present invention couldequally use any audio or speech based codec to represent the core layer.

The generation of the core layer coded signal is shown in FIG. 4 by step303.

The layer codec processor 205 uses the synthesized audio signal (inorder words the audio signal is first encoded into parameters such asthose described above and then decoded back into an audio signal withinthe same core layer codec). This synthesized signal is used within thelayer codec processor 205 to generate the enhanced layer.

In one embodiment of the invention the synthesized signal and the audiosignal are transformed into the frequency domain and the differencebetween the two frequency domain signals is then encoded to produce theenhancement layer data.

In other embodiments of the invention ITU-T Embedded Variable Bit-Rate(EV-VBR) speech/audio codec enhancement layers and ITU-T Scalable VideoCodec (SVC) enhancement layers may be generated.

Further embodiments may include but are not limited to VariableMulti-Rate Wideband (VMR-WB), ITU-T G.729, ITU-T G.729.1, AdaptiveMultirate Rate-Wide band (AMR-WB), and Adaptive Multirate Rate-WidebandPlus (AMR-WB+).

In other embodiments of the invention any suitable enhancement layercodec may be employed to extract the correlation between the synthesizedsignal and the original audio signal to generate an enhanced layer datasignal.

The generation of the enhancement data is shown in FIG. 4 by step 305.

The core layer RTP session generator 209 receives the core layer signaldata and is configured to generate the RTP session data associated withthe core layer signal data. The RTP session data will typically comprisea packet with a header comprising a time stamp value (TS_(C)) which asdescribed above marks a relative time which enables the packets of anRTP session to be reassembled in the correct order when having beenreceived at the receiver.

The generation of the core layer RTP session packet is shown in FIG. 4by step 307.

The enhancement layer RTP session generator 211 receives the enhancementlayer data signal and initially generates an enhancement layer RTPsession packet. The enhancement layer RTP packet will also comprise aheader comprising a time stamp value (TS_(E)) which marks a relativetime with respect to the enhancement layer which would enable thereceiver to reconstruct the order of the enhancement layer RTP packetsreceived.

The generation of the enhancement layer RTP session packet is shown inFIG. 4 by step 309.

The enhancement layer RTP session generator furthermore checks whetherthe current packet being generated is at a synchronization point.

In a first embodiment of the invention the synchronization point is thefirst packet at the start of the session generation. In other words theenhancement layer RTP session generator determines if this enhancementlayer session packet is the first to be transmitted.

In other embodiments of the invention the synchronization point is notonly for the first packet at the start of a generated session but for apredetermined number of subsequent packets. In this embodiment theaddition of synchronization information on additional packets allows fora larger error rate where one or more packets may not be received, yetthe synchronization information may still be received by the receiver.

In a further embodiment of the invention the synchronization point istaken to be every n'th packet. In other words a synchronization point isdetermined after a predetermined number of packets have beentransmitted. For example the synchronization point is determined afterevery 100 packets. This predetermined number may be chosen dependent onthe bit or packet error rate from the transmitter to the receiver.

In further embodiments of the invention the synchronization point istaken to be the packet generated or transmitted after a predeterminedregular interval. This predetermined regular interval may be chosendependent on the bit or packet error rate from the transmitter to thereceiver so to optimise the ability of the receiver to receivesynchronization information as disclosed below with the overheadrequirements by transmitting the synchronization information.

In some embodiments of the information there may be a feedback loop fromthe decoder/receiver 108 to the encoder/transmitter 104 which isconfigured to receive a request from the decoder/receiver 108 forsynchronization information. On receipt of the request the enhancementlayer RTP session generator may be configured to determine that asynchronization point has been reached. The synchronization point may bedetermined to remain in place following the receipt of the request fromthe decoder/receiver for a predetermined number of packets.

In some embodiments of the invention the enhancement layer RTP sessiongenerator 211 may be configured to determine that a synchronizationpoint has been reached each time the encoder/transmitter 104 needs tointroduce a discontinuity in the transmitted stream of packets. Forexample when the encoder needs to add new enhancement layers based onthe media characteristics. For example audio stream switching from monoto stereo is implemented by including an enhancement layer containingstereo bit stream as a separate RTP session. In addition, the availablecomputational resources may vary in the encoding terminal allowingoccasionally higher bit rate encoding causing discontinuities.Furthermore Call on-hold functionality creates a discontinuity inencoding.

Furthermore in some embodiments of the invention the enhancement layerRTP session generator 211 may be configured to determine that asynchronization point has been reached each time the transmitted layerconfiguration is changed to include a higher number of layers.

Furthermore in some embodiments of the invention the enhancement layerRTP session generator 211 may be configured to determine that asynchronization point has been reached each time a talk spurt (in aspeech session) is started.

In some embodiments of the invention the enhancement layer RTP sessiongenerator 211 may be configured to determine that a synchronizationpoint has been reached each time the session is re-started after aperiod of inactivity.

The detection of the synchronization point is shown in FIG. 4 by step311.

If a synchronization point is determined to have been reached then theprocedure inserts a synchronization element.

The enhancement layer RTP session generator 211 inserts asynchronization element into the enhancement layer RTP session packetgenerated in step 309.

In a first embodiment of the invention the enhancement layer RTP sessiongenerator 211 inserts in the current enhancement layer RTP sessionpacket, as a synchronization element, the current value of the RTP TS ofthe RTP session carrying the corresponding core layer data. In otherwords the value of TS_(C) is inserted into a packet of the (each) RTPsession carrying the enhancement layer data in case a synchronizationdata needs to be provided for the receiver.

This would enable, as is described below, the decoder/receiver tocompute the offset between the core layer RTP TS (TS_(C)) and the RTP TSof each RTP session carrying the enhancement layer data (TS_(E)). Wherein embodiments of the invention there are multiple enhancement layers(not shown in FIG. 3), each enhancement layer RTP session generator mayinsert the core layer RTP TS value associated with the frame of theenhancement layer signal and therefore each session can similarly usethe TS_(C) value to synchronize each layer at the decoder/receiver 108.

In further embodiments of the invention each enhancement layer RTPsession generator receives the core layer RTP session TS (TS_(C)) andgenerates an offset value of the difference between the core layer RTPTS value and enhancement layer RTP TS value in a packet of the (each)RTP session carrying enhancement layer data. The enhancement layer RTPsession generator 211 then inserts the TS offset value as thesynchronization element into an enhancement layer RTP session packet ateach RTP session carrying the layers of the corresponding core layerdata.

In yet another embodiment the RTP session generator includes a value ofthe common timing reference as the synchronization element in a packetof each of the RTP sessions carrying layers of the same media frame. Anexample of such a common time reference is NTP timestamp, which iscarried also in the RTCP sender and receiver reports.

In a further embodiment of the invention the enhancement layer RTPsession generator 211 inserts the RTP sequence number (SN) value for theRTP session carrying the core layer data in a packet of the (each) RTPsession carrying the enhancement layer data from the same media frame asthe synchronization element. This would enable, as described below, thedecoder/receiver 108 to find the corresponding RTP TS values and computethe TS offsets required for cross-session synchronization.

In further embodiments of the invention the enhancement layer RTPsession generator 211 inserts an identifier, as a synchronizationelement, with the same value in a packet at each RTP sessionrepresenting a layer or layers of the same media frame.

In some embodiments of the invention the identifier value may changefrom layer to layer (from RTP session to RTP session) according to apredetermined pattern.

With respect to FIG. 5, two formats of the RTP packets with insertedsynchronization elements may be shown.

As is known in the art the proposed EV-VBR codec RTP payload introducesa concept of transport block, which carries one or several layers fromone or several encoded audio frames. Each transport block isindependent, consisting of layer identification (L-ID) value indicatingthe layer configuration carried in the transport block, information onthe number of frames included in the transport block, and the actualencoded audio data. Thus, the decoder/receiver 108 may use the L-IDvalue to uniquely identify the layers that are carried in each of thereceived transport blocks.

In an embodiment of the invention an L-ID value to indicate a transportblock carrying the inserted cross-layer synchronization element insteadof encoded audio data is allocated. The transport block carrying thesynchronization element may be transmitted together with the transportblock(s) carrying the actual EV-VBR audio data.

Thus as can be seen in FIG. 5( a) the RTP session generators 209/211 mayoutput a transport block 401, which according to such an embodiment ofthe invention comprises a RTP header 403 comprising the RTP headerinformation, a first layer identification value (L-ID=x) 405 which has avalue of ‘x’ indicating the presence of audio data, the audio data 407for the frame or frames for the layer or layers, a second layeridentification value (L-ID=y) 409 which contains an identifierindicating that the synchronization element is enclosed and follows, andthe synchronization element 411.

The example in FIG. 5( a) shows in an embodiment where thesynchronization element may be transmitted in the same packet with onlya single transport block containing audio data. The packet may alsocontain several transport blocks carrying audio data.

In a further embodiment of the invention the synchronization element isincluded as part of the data stream but as its own RTP packet. As can beseen in FIG. 5( b), the RTP session generators 209/211 may output afirst RTP packet 451 comprising a RTP header 461 comprising the RTPheader information, a first layer identification value (L-ID=x) 463which has a value of ‘x’ indicating the presence of audio data, and theaudio data 465 for the frame or frames for the layer, or layer. Themultiplexer 213 also outputs a second related RTP packet 453 comprisinga second packet RTP header 467 comprising the RTP header information, asecond layer identification value (L-ID=y) 469 which has a value of ‘y’,and the synchronization element 471.

In both cases the timing associated with the synchronization element isdefined according to the embodiments of the invention by the RTP TS ofthe packet carrying the cross-layer synchronization element(alternatively, the timing could be specified to be computed from the(RIP) packet header information using a predefined method—e.g. it can bespecified to have the same timing as the audio data blockpreceding/following it).

Typically the amount of information needed for the synchronizationelement is small compared to the size of the actual media data. However,in case of strict bandwidth limitations, especially in embodimentsrequiring the inclusion of the synchronization element, only on RTPsessions conveying enhancement layer data it may be possible tocompletely replace the enhancement layer audio data with thesynchronization element in those few frames it is needed. This does notprevent media rendering in the receiver since reception of the corelayer audio data ensures successful decoding, although the audio qualitymay be temporarily degraded.

Furthermore, in all embodiments it is possible to send thesynchronization element during inactive source signals, if applicable.For example in embodiments using the EV-VBR codec, the synchronizationelement may be transmitted during non-speech periods and when it isfeasible in the application point of view so to prevent using additionalbandwidth during speech periods.

The enhancement layer RTP session generator may then pass the generatedenhancement layer RTP packets to the network interface 213.

The network interface 213 on receiving the packets from the core layerRTP session generator 209 and the enhancement layer RTP sessiongenerator 211 outputs or stores the RTP packets.

To further assist the understanding of the invention the operation ofthe decoder 108 with respect to the embodiments of the invention isshown with respect to the decoder schematically shown in FIG. 6 and theflow chart showing the operation of the decoder in FIG. 7.

The decoder comprises an input 502 from which the encoded packet stream112 may be received. The input 502 is connected to the packet receiver501.

The packet receiver is configured to forward the received packet stream112 as at least two separate packet streams. The first packet streamfrom the receiver is passed to the core layer processor 503.

This unpacking process is shown in FIG. 7 by step 601.

The core layer processor 503 receives the core layer packets and isconfigured to pass core layer data and a core layer timing elementassociated with the core layer data to the core layer decoder 507.

The extraction of the timing element TS_(C) and the core layer data isshown in FIG. 7 by step 603.

The enhancement layer processor 505 receives the enhancement layerpackets and is configured to pass enhancement layer data and anenhancement layer timing element TS_(E) associated with the enhancementlayer data to the enhancement layer decoder 509.

The extraction of the timing element TS_(E) and the enhancement layerdata is shown in FIG. 7 by the step 605.

The enhancement layer processor 505 furthermore detects if there is asynchronization element within the enhancement layer packets orassociated with the packets.

If there is no synchronization element the method passes to the decodingsteps 611/613.

If there is synchronization elements within the enhancement layerpackets or associated with the packets then the enhancement layerprocessor 505 extracts the synchronization element from the layerpackets.

The extraction of the synchronization element is shown in FIG. 7 by step607.

The enhancement layer processor on extracting the synchronizationelement then determines the offset between the core layer and theenhancement layer.

In the embodiment of the invention as described above where the corelayer RTP packet TS TS_(C) is inserted into the enhancement packet theoffset can be determined by a simple subtraction. For example

Let us assume we have three RTP sessions, one carrying the core layerdata (RTP₁) and the two others, one carrying enhancement layer 1 (RTP₂)and the second carrying enhancement layer 2 (RTP₃). At time t theencoder/transmitter 104 sends the core layer of encoded frame n in RTP₁with the standard timestamp value TS_(t1). To provide thesynchronization information for the enhancement layers, asynchronization element containing the corresponding timestamp value inRTP₁, i.e. TS_(t1), is included in the payload carrying enhancementlayers 1 (in RTP₂) and 2 (in RTP₃) for encoded frame n. Note that thesepackets themselves will have standard RTP timestamp values TS_(t2) andTS_(t3) in their RTP headers, respectively.

On receiving the synchronization element(s) the decoder/receiver 108 cannow find all the enhancement layers that correspond to any given corelayer data by comparing the RTP TS of the core layer data and thesynchronisation information consisting of a core layer time stamptransmitted within the enhancement layer payloads. Most importantly, thedecoder/receiver 108 can determine the offset between the core layer TSand the enhancement layer 1 TS by TS_(o2)=TS_(t1)−TS_(t2). This wouldprovide the required information for the cross-layer synchronization:the enhancement layer 1 for encoded frame n has the timestamp TS_(t2),and the core layer for the same encoded frame has the timestamp valueTS_(t2)+TS_(o2). Or in other words, any subsequent timestamp of the RTP₂can be brought to the RTP₁ timeline by adding TS_(o2) to the TS valuereceived in the RTP header of an RTP₂ packet.

In a similar manner the offset between the core layer TS and theenhancement layer 2 TS can by computed by TS_(o3)=TS_(t1)−TS_(t3), andsubsequent timestamps are transformed to match the RTP₁ timestamps byadding TS_(o3).

Similarly the offset may be determined by examining the synchronizationelements in other embodiments of the invention, such as reading theoffset value, or by comparing the received sequence number in thesynchronization elements with the sequence number of the received corelayer packet, or by comparing the identifier on the enhancement layerpacket with the core layer packet.

The determination of the offset is shown in FIG. 7 by step 609.

The layer decoder 507 performs the complementary operation to thatperformed by the core layer codec processor 205 on the core layer datato generate a synthetic audio signal for the core layer. Furthermore thecore layer synthetic signal is used internally to generate theenhancement layer(s).

The decoding of the core layer is shown in FIG. 7 by step 611.

The layer decoder 507 furthermore performs the complementary operationto that performed by the layer codec processor 205 to generate asynthetic enhancement layer signal.

In embodiments of the invention where the decoding of the syntheticenhancement layer signal requires data from the synthetic core layersignal the enhancement layer signal from the enhancement layer processoris synchronized using the offset information passed from the enhancementlayer processor.

The synchronizing and decoding of the enhanced layer is shown in FIG. 7by step 613.

The synthetic enhancement layer signal is then combined together withthe synthetic core layer signal to generate the synthetic audio signalon the output 506.

In some embodiments of the invention which do not share common timecritical data relating to the decoding process the synchronization ofthe synthetic signals may be carried out using the offset informationafter the decoding of the enhancement layer.

It is to be understood that even though the present invention has beenexemplary described in terms of a core layer and single enhancementlayer, it is to be understood that the present invention may be appliedto further enhancement layers.

The embodiments of the invention described above describe the codec interms of separate encoders 104 and decoders 108 apparatus in order toassist the understanding of the processes involved. However, it would beappreciated that the apparatus, structures and operations may beimplemented as a single encoder-decoder apparatus/structure/operation.Furthermore in some embodiments of the invention the coder and decodermay share some/or all common elements.

As mentioned previously although the above process describes a singlecore audio encoded signal and a single enhancement layer audio encodedsignal the same approach may be applied to synchronize and two mediastreams using the same or similar packet transmission protocols.

Although the above examples describe embodiments of the inventionoperating within a codec within an electronic device 610, it would beappreciated that the invention as described below may be implemented aspart of any variable rate/adaptive rate audio (or speech) codec. Thus,for example, embodiments of the invention may be implemented in an audiocodec which may implement audio coding over fixed or wired communicationpaths.

Thus user equipment may comprise an audio codec such as those describedin embodiments of the invention above.

It shall be appreciated that the term user equipment is intended tocover any suitable type of wireless user equipment, such as mobiletelephones, portable data processing devices or portable web browsers.

Furthermore elements of a public land mobile network (PLMN) may alsocomprise audio codecs as described above.

In general, the various embodiments of the invention may be implementedin hardware or special purpose circuits, software, logic or anycombination thereof. For example, some aspects may be implemented inhardware, while other aspects may be implemented in firmware or softwarewhich may be executed by a controller, microprocessor or other computingdevice, although the invention is not limited thereto. While variousaspects of the invention may be illustrated and described as blockdiagrams, flow charts, or using some other pictorial representation, itis well understood that these blocks, apparatus, systems, techniques ormethods described herein may be implemented in, as non-limitingexamples, hardware, software, firmware, special purpose circuits orlogic, general purpose hardware or controller or other computingdevices, or some combination thereof.

For example the embodiments of the invention may be implemented as achipset, in other words a series of integrated circuits communicatingamong each other. The chipset may comprise microprocessors arranged torun code, application specific integrated circuits (ASICs), orprogrammable digital signal processors for performing the operationsdescribed above.

The embodiments of this invention may be implemented by computersoftware executable by a data processor of the mobile device, such as inthe processor entity, or by hardware, or by a combination of softwareand hardware. Further in this regard it should be noted that any blocksof the logic flow as in the Figures may represent program steps, orinterconnected logic circuits, blocks and functions, or a combination ofprogram steps and logic circuits, blocks and functions.

The memory may be of any type suitable to the local technicalenvironment and may be implemented using any suitable data storagetechnology, such as semiconductor-based memory devices, magnetic memorydevices and systems, optical memory devices and systems, fixed memoryand removable memory. The data processors may be of any type suitable tothe local technical environment, and may include one or more of generalpurpose computers, special purpose computers, microprocessors, digitalsignal processors (DSPs) and processors based on multi-core processorarchitecture, as non-limiting examples.

Embodiments of the inventions may be practiced in various componentssuch as integrated circuit modules. The design of integrated circuits isby and large a highly automated process. Complex and powerful softwaretools are available for converting a logic level design into asemiconductor circuit design ready to be etched and formed on asemiconductor substrate.

Programs, such as those provided by Synopsys, Inc. of Mountain View,Calif. and Cadence Design, of San Jose, Calif. automatically routeconductors and locate components on a semiconductor chip using wellestablished rules of design as well as libraries of pre-stored designmodules. Once the design for a semiconductor circuit has been completed,the resultant design, in a standardized electronic format (e.g., Opus,GDSII, or the like) may be transmitted to a semiconductor fabricationfacility or “fab” for fabrication.

The foregoing description has provided by way of exemplary andnon-limiting examples a full and informative description of theexemplary embodiment of this invention. However, various modificationsand adaptations may become apparent to those skilled in the relevantarts in view of the foregoing description, when read in conjunction withthe accompanying drawings and the appended claims. However, all such andsimilar modifications of the teachings of this invention will still fallwithin the scope of this invention as defined in the appended claims.

The invention claimed is:
 1. An apparatus comprising: at least oneprocessor; and at least one memory including computer program code, theat least one memory, the at least one processor, and the computerprogram code configured to cause the apparatus to at least: generatepackets from an input signal; generate at least one core layer encodedsignal, based at least in part on the input signal, the core layerencoded signal comprising a first relative time value; generate at leastone enhancement layer signal, based at least in part on the input signaland associated with the at least one core layer encoded signal; generateat least one indicator associated with each of the at least oneenhancement layer signal, each of the at least one indicator based atleast in part on the first relative time value; and determine if asynchronization point has been reached, wherein the at least oneindicator comprises an identifier identifying the at least one corelayer encoded signal, and wherein the at least one indicator isgenerated only if the synchronization point has been reached.
 2. Theapparatus of claim 1, wherein the first relative time value comprises areal-time transport protocol time stamp, and wherein the identifiercomprises the real-time transport protocol time stamp of the at leastone core layer encoded signal.
 3. The apparatus of claim 1, wherein eachof the at least one enhancement layer signal comprises an associatedsecond relative time value, and wherein the at least one indicatorcomprises a difference between the first relative time value and theassociated second relative time value.
 4. The apparatus of claim 1,wherein the at least one core layer encoded signal further comprises: asequence number, wherein each of the at least one indicator comprisesthe sequence number, and a first identifier value based at least in parton the first relative time value.
 5. The apparatus of claim 4, whereineach of the at least one indicator comprises the first identifier value.6. The apparatus of claim 4, wherein each of the at least one indicatorcomprises at least one second identifier value, the at least one secondidentifier value being based at least in part on the first identifiervalue and the at least one enhancement layer signal.
 7. The apparatus ofclaim 1, wherein the synchronization point is determined based at leastin part on receiving a synchronization request.
 8. The apparatus ofclaim 1, wherein the at least one core layer encoded signal comprises aplurality of packets, wherein the synchronization point is determinedbased at least in part on detecting any of a predetermined number ofpackets, and wherein the predetermined number of packets comprise atleast one of: a first predetermined number of packets of the pluralityof packets; at least one of the predetermined number of packetsseparated from a second of the predetermined number of packets by asecond predetermined number of packets; and at least one of thepredetermined number of packets separated from a second of thepredetermined number of packets by a predetermined time period.
 9. Theapparatus of claim 1, wherein the at least one core layer encoded signalcomprises a plurality of packets, and wherein the synchronization pointis determined based at least in part on detecting one of the following:a discontinuity in a stream of packets; a changing of a configuration ofat least one of the core layer encoded signal and at least one of theenhancement layer signal; and a talk spurt or a restarting of a session.10. An apparatus comprising: at least one processor; and at least onememory including computer program code, the at least one memory, the atleast one processor, and the computer program code configured to causethe apparatus to at least: receive an encoded signal comprising at leastone core layer encoded signal, at least one enhancement layer encodedsignal, and at least one indicator, the at least one core layer encodedsignal comprising a first relative time value, the at least oneenhancement layer encoded signal comprising a second relative timevalue, and the at least one indicator associated with each of the atleast one enhancement layer encoded signal; synchronize each of the atleast one enhancement layer encoded signal with at least one of the atleast one core layer encoded signal based at least in part on theassociated at least one indicator first relative time value; and decodeeach of the at least one enhancement layer encoded signal synchronizedwith the at least one core layer encoded signal to generate a decodedsignal, wherein the at least one indicator comprises an identifieridentifying one of the at least one core layer encoded signal, andwherein the at least one indicator comprises a difference between thefirst relative time value and the second relative time value.
 11. Theapparatus of claim 10, wherein each first relative time value comprisesa first time stamp value, and wherein the at least one indicatorcomprises the first time stamp value for one of the at least one corelayer encoded signal associated with one of the at least one enhancementlayer encoded signal.
 12. The apparatus of claim 11, wherein each secondrelative time value comprises a second time stamp value, and wherein theapparatus is further configured to determine a time offset value betweenthe first time stamp value and the second time stamp value of theassociated one of the at least one enhancement layer encoded signal. 13.The apparatus of claim 10, wherein the at least one core layer encodedsignal further comprises a sequence number, and wherein each of the atleast one indicator comprises the sequence number of the associated atleast one core layer encoded signal.
 14. The apparatus of claim 10,wherein the at least one core layer encoded signal further comprises afirst identifier value based at least in part on the first relative timevalue, wherein each of the at least one indicator comprises the firstidentifier value, and wherein each of the at least one indicatorcomprises at least one second identifier value, the second identifiervalue being based at least in part on the first identifier value and theat least one enhancement layer encoded signal.
 15. A method comprising:generating, by at least one processor, packets from an input signal;generating, by the at least one processor, at least one core layerencoded signal, based at least in part on the input signal, the corelayer encoded signal comprising a first relative time value; generating,by the at least one processor, at least one enhancement layer signal,based at least in part on the input signal and associated with the atleast one core layer encoded signal; generating, by the at least oneprocessor, at least one indicator associated with each of the at leastone enhancement layer signal, each of the at least one indicator basedat least in part on the first relative time value; and determining, bythe at least one processor, if a synchronization point has been reached,wherein the at least one indicator comprises an identifier identifyingthe at least one core layer encoded signal, and wherein the at least oneindicator is generated only if the synchronization point has beenreached.
 16. The method of claim 15, wherein the first relative timevalue comprises a real-time transport protocol time stamp, and whereinthe identifier comprises the real-time transport protocol time stamp ofthe at least one core layer encoded signal.
 17. The method of claim 15,wherein each of the at least one enhancement layer signal comprises anassociated second relative time value, and wherein the at least oneindicator comprises a difference between the first relative time valueand the associated second relative time value.
 18. The method of claim15, wherein the at least one core layer encoded signal furthercomprises: a sequence number, wherein each of the at least one indicatorcomprises the sequence number; and a first identifier value based atleast in part on the first relative time value.
 19. The method of claim18, wherein each of the at least one indicator comprises the firstidentifier value.
 20. The method of claim 18, wherein each of the atleast one indicator comprises at least one second identifier value, theat least one second identifier value being based at least in part on thefirst identifier value and the at least one enhancement layer signal.21. The method of claim 15, wherein the synchronization point isdetermined based at least in part on receiving a synchronizationrequest.
 22. The method of claim 15, wherein the at least one core layerencoded signal comprises a plurality of packets, wherein thesynchronization point is determined based at least in part on detectingany of a predetermined number of packets, and wherein the predeterminednumber of packets comprise at least one of: a first predetermined numberof packets of the plurality of packets; at least one of thepredetermined number of packets separated from a second of thepredetermined number of packets by a second predetermined number ofpackets; and at least one of the predetermined number of packetsseparated from a second of the predetermined number of packets by apredetermined time period.
 23. The method of claim 15, wherein the atleast one core layer encoded signal comprises a plurality of packets,and wherein the synchronization point is determined based at least inpart on detecting one of the following: a discontinuity in a stream ofpackets; a changing of a configuration of at least one of the core layerencoded signal and at least one of the enhancement layer signal; and atalk spurt or a restarting of a session.
 24. A method comprising:receiving, by at least one processor, an encoded signal comprising atleast one core layer encoded signal, at least one enhancement layerencoded signal, and at least one indicator, the at least one core layerencoded signal comprising a first relative time value, the at least oneenhancement layer encoded signal comprising a second relative timevalue, and the at least one indicator associated with each of the atleast one enhancement layer encoded signal; synchronizing, by the atleast one processor, each of the at least one enhancement layer encodedsignal with at least one of the at least one core layer encoded signalbased at least in part on the associated at least one indicator firstrelative time value; and decoding, by the at least one processor, eachof the at least one enhancement layer encoded signal synchronized withthe at least one core layer encoded signal to generate a decoded signal,wherein the at least one indicator comprises an identifier identifyingone of the at least one core layer encoded signal, and wherein the atleast one indicator comprises a difference between the first relativetime value and the second relative time value.
 25. The method of claim24, wherein each first relative time value comprises a first time stampvalue, and wherein the at least one indicator comprises the first timestamp value for one of the at least one core layer encoded signalassociated with one of the at least one enhancement layer encodedsignal.
 26. The method of claim 25, wherein each second relative timevalue comprises a second time stamp value, and wherein the methodfurther comprises determining a time offset value between the first timestamp value and the second time stamp value of the associated one of theat least one enhancement layer encoded signal.
 27. The method of claim24, wherein the at least one core layer encoded signal further comprisesa sequence number, and wherein each of the at least one indicatorcomprises the sequence number of the associated at least one core layerencoded signal.
 28. The method of claim 24, wherein the at least onecore layer encoded signal further comprises a first identifier valuebased at least in part on the first relative time value, wherein each ofthe at least one indicator comprises the first identifier value, andwherein each of the at least one indicator comprises at least one secondidentifier value, the second identifier value being based at least inpart on the first identifier value and the at least one enhancementlayer encoded signal.
 29. A computer program product comprising anon-transitory computer-readable medium encoded with instructions that,when executed by at least one processor, perform at least the following:generating packets from an input signal; generating at least one corelayer encoded signal, based at least in part on the input signal, thecore layer encoded signal comprising a first relative time value;generating at least one enhancement layer signal, based at least in parton the input signal and associated with the at least one core layerencoded signal; generating at least one indicator associated with eachof the at least one enhancement layer signal, each of the at least oneindicator based at least in part on the first relative time value; anddetermining if a synchronization point has been reached, wherein the atleast one indicator comprises an identifier identifying the at least onecore layer encoded signal, and wherein the at least one indicator isgenerated only if the synchronization point has been reached.
 30. Acomputer program product comprising a non-transitory computer-readablemedium encoded with instructions that, when executed by at least oneprocessor, perform at least the following: receiving an encoded signalcomprising at least one core layer encoded signal, at least oneenhancement layer encoded signal, and at least one indicator, the atleast one core layer encoded signal comprising a first relative timevalue, the at least one enhancement layer encoded signal comprising asecond relative time value, and the at least one indicator associatedwith each of the at least one enhancement layer encoded signal;synchronizing each of the at least one enhancement layer encoded signalwith at least one of the at least one core layer encoded signal based atleast in part on the associated at least one indicator first relativetime value; and decoding each of the at least one enhancement layerencoded signal synchronized with the at least one core layer encodedsignal to generate a decoded signal, wherein the at least one indicatorcomprises an identifier identifying one of the at least one core layerencoded signal, and wherein the at least one indicator comprises adifference between the first relative time value and the second relativetime value.